Sinica Treebank: Design Criteria, Annotation Guidelines, And On-Line Interface

نویسندگان

  • Chu-Ren Huang
  • Feng-Yi Chen
  • Keh-Jiann Chen
  • Zhao-Ming Gao
  • Kuang-Yu Chen
چکیده

This paper describes the design criteria and annotation guidelines of Sinica Treebank. The three design criteria are: Maximal Resource Sharing, Minimal Structural Complexity, and Optimal Semantic Information. One of the important design decisions following these criteria is the encoding of thematic role information. An on-line interface facilitating empirical studies of Chinese phrase structure is also described.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design criteria , representational issues and implementation

This paper describes the design criteria and annotation guidelines of the Sinica Treebank. The three design criteria are: Maximal Resource Sharing, Minimal Structural Complexity, and Optimal Semantic Information. One of the important design decisions guided by these criteria is the encoding of thematic role information. We discuss the representational and methodological issues based on our desi...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0)

This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public. The POS tagging guidelines have been revised several tim...

متن کامل

An Annotation Scheme for a Persian Treebank

In this paper we present and justify methodological principles and syntactic criteria to design an annotation scheme for a Persian Treebank. The main approaches to the annotation of Treebanks are presented in order to account for taken decisions. After examining these approaches, and taking into account the syntactic characteristics of Persian, the most appropriate one will be selected and its ...

متن کامل

Creating a Methodology for Large-Scale Correction of Treebank Annotation: The Case of the Arabic Treebank

The LDC Arabic Treebank team has significantly revised and enhanced its annotation guidelines and annotation procedures over the last two years, with the goal of reducing inconsistency in annotation in the Treebank. We have now completed automatic and significant manual revisions to 738,845 tokens/words in total, bringing them into line as far as possible with the new annotation guidelines and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000